Splitting Long Input Sentences for Phrase-based Statistical Machine Translation

نویسندگان

  • Chooi-Ling Goh
  • Eiichiro Sumita
چکیده

Translation results suffer when a standard phrasebased statistical machine translation system is used for translating long sentences. The translation output will not have the same word order as the source. When a sentence is long, it should be partitioned into several clauses, and the word reordering during the translation done within these clauses, not between the clauses. In this paper, we propose splitting the long sentences using linguistic information, and translating the sentence piece by piece. In other words, we constrain the word reordering so that it can only be done within the pieces but not between the pieces. We then apply a language model to join the pieces back together in the original sequence in order to reduce disfluencies in the connection. By doing so, word order can be preserved and translation quality improved. Our experiments on the patent translation from Japanese to English are able to achieve better translations measured by both BLEU score and word error rate (WER).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based Reordering Constraints for Phrase-based SMT

Translation results suffer when a standard phrase-based statistical machine translation system is used for translating long sentences. The translation output will not preserve the same word order as the source, especially between a language pair that has different syntactic structures. When a sentence is long, it should be partitioned into several clauses, and the word reordering during the tra...

متن کامل

A Hybrid Approach Using Phrases and Rules for Hindi to English Machine Translation

The present work focuses on developing a hybrid approach for developing a machine translation (MT) scheme for automatic translation of Hindi sentences to English. Development of machine translation (MT) systems for Indian languages to English almost invariably suffers from the limited availability of linguistic resources. As a consequence, statistical, rule-based or example-based approaches hav...

متن کامل

Statistical machine translation using large j/e parallel corpus and long phrase tables

Our statistical machine translation system that uses large Japanese-English parallel sentences and long phrase tables is described. We collected 698,973 Japanese-English parallel sentences, and we used long phrase tables. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[1], ”moses”[2], and ”training-phrasemodel.perl”[3]. We used these data and these tools, W...

متن کامل

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation

This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce dif...

متن کامل

A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation

We compare two pivot strategies for phrase-based statistical machine translation (SMT), namely phrase translation and sentence translation. The phrase translation strategy means that we directly construct a phrase translation table (phrase-table) of the source and target language pair from two phrase-tables; one constructed from the source language and English and one constructed from English a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011